NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

APOLLO: SGD-like Memory AdamW-level Performance

Zhu, Hanqing; Zhang, Zhenyu; Cong, Wenyan; Liu, Xi; Park, Sem; Chandra, Vikas; Long, Bo; Pan, David Z; Wang, Zhangyang; Lee, Jinwon (May 2025, Conference on Machine Learning and Systems (MLSys))

Full Text Available
DNN-based monaural speech enhancement using alternate analysis windows for phase and magnitude modification

https://doi.org/10.21437/Interspeech.2024-2244

Liu, Xi; Hansen, John HL (September 2024, ISCA)

In recent decades, considerable research has been devoted to speech enhancement leveraging the short-term Fourier transform (STFT) analysis. As speech processing technology evolves, the significance of phase information in enhancing speech intelligibility becomes more noticeable. Typically, the Hanning window has been widely employed as analysis window in STFT. In this study, we propose the Chebyshev window for phase analysis, and the Hanning window for magnitude analysis. Next, we introduce a novel cepstral domain enhancement approach designed to robustly reinforce the harmonic structure of speech. The performance of our model is evaluated using the DNS challenge test set as well as the naturalistic APOLLO Fearless Steps evaluation set. Experimental results demonstrate that the Chebyshev-based phase solution outperforms the Hanning option for in phase-aware speech enhancement. Furthermore, the incorporation of quefrency emphasis proves effective in enhancing overall speech quality.
more » « less
Full Text Available
APOLLO: SGD-like Memory, AdamW-level Performance

Zhu, Hanqing; Zhang, Zhang; Cong, Wenyan; Liu, Xi; Park, Sem; Chandra, Vikas; Long, Bo; Zan, David Z; Wang, Zhangyang; Lee, Jinwon (February 2025, MLSys 2025)

Large language models (LLMs) are notoriously memory-intensive during training, particularly with the popular AdamW optimizer. This memory burden necessitates using more or higher-end GPUs or reducing batch sizes, limiting training scalability and throughput. To address this, various memory-efficient optimizers have been proposed to reduce optimizer memory usage. However, they face critical challenges: (i) reliance on costly SVD operations; (ii) significant performance trade-offs compared to AdamW; and (iii) still substantial optimizer memory overhead to maintain competitive performance. In this work, we identify that AdamW's learning rate adaptation rule can be effectively coarsened as a structured learning rate update. Based on this insight, we propose Approximated Gradient Scaling for Memory-Efficient LLM Optimization (APOLLO), which approximates learning rate scaling using an auxiliary low-rank optimizer state based on pure random projection. This structured learning rate update rule makes APOLLO highly tolerant to further memory reductions while delivering comparable pre-training performance. Even its rank-1 variant, APOLLO-Mini, achieves superior pre-training performance compared to AdamW with SGD-level memory costs. Extensive experiments demonstrate that the APOLLO series performs on-par with or better than AdamW, while achieving greater memory savings by nearly eliminating the optimization states of AdamW. These savings provide significant system-level benefits: (1) Enhanced Throughput: 3x throughput on an 8xA100-80GB setup compared to AdamW by supporting 4x larger batch sizes. (2) Improved Model Scalability: Pre-training LLaMA-13B with naive DDP on A100-80GB GPUs without system-level optimizations. (3) Low-End GPU Friendly Pre-training: Pre-training LLaMA-7B on a single GPU using less than 12 GB of memory with weight quantization.
more » « less
Full Text Available
DistDNAS: Search Efficient Feature Interactions within 2 Hours

https://doi.org/10.1109/BigData62323.2024.10825061

Zhang, Tunhou; Wen, Wei; Fedorov, Igor; Liu, Xi; Zhang, Buyun; Han, Fangqiu; Chen, Wen-Yen; Han, Yiping; Yan, Feng; Li, Hai; et al (December 2024, IEEE)

Full Text Available
DistDNAS: Search Efficient Feature Interactions within 2 Hours

Zhang, Tunhou; Wen, Wei; Fedorov, Igor; Liu, Xi; Zhang, Buyun; Han, Fangqiu; Chen, Wen-Yen; Han, Yiping; Han, Feng; Li, Hai; et al (December 2024, IEEE International Conference on Big ata (IEEE BigData))

Full Text Available
Dual-Path Minimum-Phase and All-Pass Decomposition Network for Single Channel Speech Dereverberation

https://doi.org/10.1109/ICASSP48485.2024.10446719

Liu, Xi; Chen, Szu-Jui; Hansen, John_H L (April 2024, IEEE)

With the development of deep neural networks (DNN), many DNN-based speech dereverberation approaches have been proposed to achieve significant improvement over the traditional methods. However, most deep learning-based dereverberation methods solely focus on suppressing time-frequency domain reverberations without utilizing cepstral domain features which are potentially useful for dereverberation. In this paper, we propose a dual-path neural network structure to separately process minimum-phase and all-pass components of single channel speech. First, we decompose speech signal into minimum-phase and all-pass components in cepstral domain, then Conformer embedded U-Net is used to remove reverberations of both components. Finally, we combine these two processed components together to synthesize the enhanced output. The performance of proposed method is tested on REVERB-Challenge evaluation dataset in terms of commonly used objective metrics. Experimental results demonstrate that our method outperforms other compared methods.
more » « less
Full Text Available
Fearless Steps Apollo: Team Communications Based Community Resource Development for Science, Technology, Education, and Historical Preservation

https://doi.org/10.1109/ICASSP48485.2024.10446811

Hansen, John HL; Joglekar, Aditya; Shekar, Meena_M C; Chen, Szu-Jui; Liu, Xi (April 2024, IEEE)

The Fearless Steps Apollo (FS-APOLLO) resource is a collection of 150,000 hours of audio, associated meta-data, and supplemental speech technology infrastructure intended to benefit the (i) speech processing technology, (ii) communication science, team-based psychology, and (iii) education/STEM, history/preservation/archival communities. The FS-APOLLO initiative which started in 2014 has since resulted in the preservation of over 75,000 hours of NASA Apollo Missions audio. Systems created for this audio collection have led to the emergence of several new Speech and Language Technologies (SLT). This paper seeks to provide an overview of the latest advancements in the FS-Apollo effort and explore upcoming strategies in big-data deployment, outreach, and novel avenues of K-12 and STEM education facilitated through this resource.
more » « less
Full Text Available
InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction

https://doi.org/10.1145/3746252.3761527

Zeng, Zhichen; Liu, Xiaolong; Hang, Mengyue; Liu, Xiaoyi; Zhou, Qinghai; Yang, Chaofei; Liu, Yiqun; Ruan, Yichen; Chen, Laming; Chen, Yuxin; et al (November 2025, ACM)

Full Text Available
Assessing Transboundary Impacts of Energy-Driven Water Footprint on Scarce Water Resources in China: Catchments under Stress and Mitigation Options

https://doi.org/10.1021/acs.est.2c08006

Liu, Xi; Du, Huibin; Zhang, Xin; Feng, Kuishuang; Zhao, Xu; Zhong, Honglin; Zhang, Ning; Chen, Zhenni (July 2023, Environmental Science & Technology)

Full Text Available
Genomic structural variation is associated with hypoxia adaptation in high-altitude zokors

https://doi.org/10.1038/s41559-023-02275-7

An, Xuan; Mao, Leyan; Wang, Yinjia; Xu, Qinqin; Liu, Xi; Zhang, Shangzhe; Qiao, Zhenglei; Li, Bowen; Li, Fang; Kuang, Zhuoran; et al (February 2024, Nature Ecology & Evolution)

Full Text Available

« Prev Next »

Search for: All records